18.05 Lecture 32 
May 2, 2005 



Two-sample t-test 

Xi,...,X„^7V(/ii,a2) 

yi,...,y„~iv(/x2,a2) 

Samples are independent. 

Compare the means of the distributions. 

Hypothesis Tests: 

i?2 : Ml Ai2,Mi > M2 



By properties of Normal distribution and Fisher's theorem: 

y/m{x-ni) v^(y-/i2) 



7V(0,1) 



^2 Am— 1' ^2 1 



X — II 




Calculate a; — j/ 



^~^A^(0,l) = iV(0,l),^^7V(0,l 

X- jJ-i _ y- 112 ^ (^-|/)-(m-M2) ^ j_ _^ 
cr cr a ' m 



jx-y)- {ill - H2) 



ma? ncr,^ 



7V(0,1) 



2 ^ „2 Am+n-2 
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Construct the t-statistic: 



N{Q,1) 

t-m+n— 2 



\/ m+n-2(^rn,+n-2) 



Construct the test: 
If H\ is true, then: 



T = 



X-y 



m+n—2 



{mul + nal) 



t 



m+n—2 




5 = {Hi :-c<T<c,H2: otherwise} 

where the c values come from the t distribution with m + n - 2 degrees of freedom. 

c = T value where the area is equal to a/2, as the failure is both below -c and above +c 

If the test were: Hi : ni < /U2, -ff2 : Mi > M2, 

then the T value would correspond to an area in one tail, as the failure is only above +c. 




There are different functions you can construct to approach the problem, 

based on different combinations of the data. 

This is why statistics is entirely based on your assumptions and the resulting 
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distribution function! 

Example: Testing soil types in different locations by amount of aluminum oxide present. 

m= 14,5= 12.56 - 7V(/ii,(T2);n = 5,y = 17.32 ~ N{h2,(t'^) 
Hi : Hi < jjL2; H2 : Hi > IJ,2 ^ T = -6.3 ~ ^14+5-2=17 



c-valuo is 1.74, however this is a one-sided test. T is very negative, but we still accept H\ 
If the hypotheses were: Hi : /ii > /X2; : Mi < ^2, 

Then the T value of -6.3 is way to the left of the c-value of -1.74. Reject Hi 



Goodness-of-fit tests. 

Setup: Consider r different categories for the random variable. 
The probability that a data point takes value Bi is pi 
T,Pi =Pi + ... -hPr = 1 

Hypotheses: Hi : = for all i = 1, r; H2 : otherwise. 
Example: (9.1.1) 

3 categories exist, regarding a family's financial situation. 

They arc cither worse, better, or the same this year as last year. 
Data: Worse = 58, Same = 64, Better = 67 (n = 189) 
Hypothesis: Hi : pi = p2 = Pz = \, H2 : otherwise. 

Ni = number of observations in each category. 

You would expect, under Hi, that A^i = npi,N2 = np2, Ns = nps 

Measure using the central limit theorem: 




P=L7.4.. 




Ni — npi 



N{0,1) 



^/npl{l-pl) 
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Howcivcr, kccip in mind that the Ni vahics arc not indcpcindcnt!! (they sum to 1) 
Ignore part of the scahng to account for this (proof beyond scope) : 



Pearson's Theorem: 



If Hi is true, then: 



If Hi is not true, then: 



-^1 — npi 



v/r^A^(0, 1) = Ar(0, 1 - pi) 



npi 



npr 



Xr-1 



Xt-1 



Proof: 



However, is squared ^ +oo 
Decision Rule: 



T +00 



Ni - np^ _N,- up, , n{pi - rf) ^ ^^^^ ^ ^^^^ 



fnp\ 



'npx 



6 = {Hi:T<c,H2:T>c} 




The example yields a T value of 0.666, from the Xr-i=3-i=2 — xi 

c is much larger, therefore accept H^. 

The difference among the categories is not significant. 



End of Lecture 32 
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